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ABSTRACT 

Because  little  is  known  about  properties  of  lines  fitted  by 
eye,  we  designed  and  carried  out  an  empirical  investigation. 
Inexperienced  graduate  and  post-doctoral  students  instructed  to 
locate  a  line  for  estimating  y  from  x  for  four  sets  of  points 
tended  to  choose  slopes  near  that  of  the  first  principal  compo¬ 
nent  (major  axis)  of  the  data  and  their  lines  passed  close  to  the 
centroids.  Students  had  a  slight  tendency  to  choose  consistently 
either  steeper  or  shallower  slopes  for  all  sets  of  data. 
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Principal  components. 
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1.  INTRODUCTION 

The  properties  of  least  squares  and  other  computed  lines  are 
well  understood,  but  surprisingly  little  is  known  about  the  com¬ 
monly  used  method  of  fitting  by  eye.  This  method  involves  maneu¬ 
vering  a  string,  black  thread,  or  ruler  until  the  fit  seems  satis¬ 
factory,  and  then  drawing  a  line.  We  report  one  systematic 
investigation  of  eye-fitting  of  lines. 

Students  fitted  lines  by  eye  to  four  sets  of  points  given 
in  an  experimental  design  to  help  us  discover  the  properties  of 
their  fitted  lines  and  whether  order  of  fitting  or  practice  made 
a  difference.  Other  populations  of  subjects  may  produce  different 
results.  These  sets  of  data  were  not  unusual  in  curvature  or  in 
having  outlying  points  or  patterns.  Thus  additional  populations 
of  data  sets  could  profitably  be  investigated. 

The  principal  quantitative  reference  on  fitting  straight 
lines  by  eye  is  Finney  (1951).  He  found  that  a  mathematical 
iteration  starting  with  slopes  provided  by  scientists,  inexperi¬ 
enced  with  probit  analysis,  gave  satisfactory  approximations  to 
the  relative  potency  in  a  bioassay. 
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2.  METHOD 

We  conducted  this  experiment  in  a  class  of  graduate  and  post 
doctoral  students  in  introductory  biostatistics.  Most  students 
had  not  studied  statistics  before  and  had  not  yet  been  shown 
formal  methods  for  fitting  lines.  The  idea  of  using  a  regression 
line  fitted  to  a  set  of  points  to  estimate  the  vertical  value, 
y  ,  from  the  horizontal  value,  x  ,  had  been  illustrated  in  a 
previous  class  session. 

Each  student  was  given  the  same  set  of  four  scatter  diagrams 
and  an  8*5x11  inch  transparency  with  a  straight  line  etched  com¬ 
pletely  across  the  middle.  Students  moved  the  transparency  over 
the  scatter  diagram  until  satisfied  with  the  fit  of  the  etched 
line,  and  then  marked  an  x  on  the  scatter  diagram  at  each  end 
of  the  line.  This  transparency  method  is  preferable  to  the 
black-thread  method,  which  requires  three  hands. 

The  four  scatter  diagrams  were  labeled  S  for  standard,  F 
for  fat,  V  for  vertical,  and  N  for  negative;  these  are  shown 
in  Figure  A.  Data  sets  S  ,  F  ,  and  V  are  linear  transforma¬ 
tions  of  each  other,  so  that  F  has  more  vertical  error  than  S 
and  V  has  a  steeper  slope  than  S  .  Data  sets  S  ,  F  ,  and 
V  come  from  a  table  of  random  numbers  in  Beyer  (1971),  whereas 
data  set  N  is  a  linear  transformation  of  the  fiber  strength 
data  on  page  224  of  Dunn  and  Clark  (1974). 


To  assess  the  effect  of  the  order  of  presentation,  we  used 
a  Latin  square  design  with  packets  stapled  in  four  different 
orders:  SNFV,  NSVF,  FVSN,  and  VFNS.  We  distributed  them 
systematically  in  that  sequence  so  that  students  sitting  side 
by  side  had  different  kinds  of  packets.  We  laid  out  on  desks 
before  class  175  packets  and  collected  153  at  the  end  of  the  hour. 
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3.  RESULTS 


Table  1  summarizes  the  averages,  variabilities,  and  actual 
(least  squares)  values  for  the  slope  and  intercept  of  each  data 
set.  We  have  reported  medians  and  interquartile  ranges  to 
reduce  the  effect  of  the  few  outlying  values.  The  y-intercept 
at  x  measures  the  height  of  a  line  as  well  as  does  the  y-inter¬ 
cept  at  zero,  and  is  less  correlated  with  the  slope.  To  get 
Table  1,  we  pooled  results  from  the  four  orders  of  presentation 
because  we  found  no  trend  in  the  differences  due  to  order. 


TABLE  1. 

Averages,  Variabilities,  and  Actual  Values 
_ for  Slopes  and  Intercepts _ _ 

S  F  V  N 

Slope 


median  (interquartile  range) 
actual  least  squares 

.70( .04) 

. 84( . 14) 

2.07(.14) 

-.73( .20) 

regression 

.66 

.66 

1.98 

-.70 

principal  component 

.68 

.82 

2.11 

-.79 

y-intercept  at  x 

3 . 88 (.10) 

median  (interquartile  range) 

3 . 86 ( . 17 ) 

3 . 95 ( . 18) 

4.04( .24) 

actual  least  squares 

3.88 

3.90 

3.89 

4.11 

Comparing  the  students'  average  slope  to  the  actual  slope, 
we  see  that  the  slope  of  the  least  squares  regression  of  y  on 
x  Is  close  to  the  average  In  each  data  set  except  F  .  One 
possible  explanation  might  be  that  students  tended  to  fit  the 
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slope  of  the  first  principal  component  or  major  axis  (the  line 
that  minimizes  the  sum  of  squares  of  perpendicular  rather  than 
vertical  distances).  The  principal  component  slope  is  closer 
to  the  median  slope  in  every  case  except  N  ,  and  is  notably 
closer  for  F  . 

Because  the  y-intercept  at  7  is  the  same  for  the  regres¬ 
sion  and  major  axis  lines,  the  conclusion  here  is  simply  that 
the  students  placed  their  lines  near  the  centroid  of  the 
cloud  of  points  in  each  case. 

By  computing  the  correlation  matrix  for  the  students' 
slopes  for  the  four  data  sets,  we  see  in  Table  2  that  students 
who  gave  steep  slopes  for  one  data  set  also  tended  to  give 
steep  slopes  on  the  others.  This  effect  seems  slight  but  is 
definite.  The  negative  values  arise  because  data  set  N  has 
negative  slope. 
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The  indi vidua! -to-indi vidua!  variability  in  slope  and  in 
intercept  was  near  the  standard  error  provided  by  least  squares 


for  the  four  data  sets.  Using  comparable  measures  of  vari¬ 
ability,  that  for  slopes  was  0.9  times  and  that  for  intercepts 
was  0.7  times  the  least-squares  standard  error.  Admittedly 
no  theory  encourages  us  to  believe  in  such  relations,  and 
further  empirical  work  might  be  instructive. 
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